Improving Arabic Cognitive Distortion Classification in Twitter using BERTopic
نویسندگان
چکیده
Social media platforms allow users to share thoughts, experiences, and beliefs. These represent a rich resource for natural language processing techniques make inferences in the context of cognitive psychology. Some inaccurate biased thinking patterns are defined as distortions. Detecting these distortions helps restructure how perceive thoughts healthier way. This paper proposed machine learning-based approach improve distortions’ classi-fication Arabic content over Twitter. One challenges that face this task is text shortness, which results sparsity co-occurrence lack information (semantic features). The enriches rep-resentation by defining latent topics within tweets. Although classification supervised learning concept, enrichment step uses unsupervised learning. algorithm utilizes transformer-based topic modeling (BERTopic). It employs two types document representations performs averaging concatenation produce contextual embeddings. A comparative analysis F1-score, precision, recall, accuracy presented. experimental demonstrate our enriched representation outperformed baseline models different rates. encouraging suggest using distribution, obtained from BERTopic technique, can classifier’s ability distinguish between CD categories.
منابع مشابه
Improved Micro-blog Classification for Detecting Abusive Arabic Twitter Accounts
The increased use of social media in Arab regions has attracted spammers seeking new victims. Spammers use accounts on Twitter to distribute adult content in Arabic-language tweets, yet this content is prohibited in these countries due to Arabic cultural norms. These spammers succeed in sending targeted spam by exploiting vulnerabilities in content-filtering and internet censorship systems, pri...
متن کاملUsing Twitter to Collect a Multi-Dialectal Corpus of Arabic
This paper describes the collection and classification of a multi-dialectal corpus of Arabic based on the geographical information of tweets. We mapped information of user locations to one of the Arab countries, and extracted tweets that have dialectal word(s). Manual evaluation of the extracted corpus shows that the accuracy of assignment of tweets to some countries (like Saudi Arabia and Egyp...
متن کاملSeminar Users in the Arabic Twitter Sphere
We introduce the notion of “seminar users”, who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecti...
متن کاملArabic to English Person Name Transliteration using Twitter
Social media outlets are providing new opportunities for harvesting valuable resources. We present a novel approach for mining data from Twitter for the purpose of building transliteration resources and systems. Such resources are crucial in translation and retrieval tasks. We demonstrate the benefits of the approach on Arabic to English transliteration. The contribution of this approach includ...
متن کاملSensing Real-World Events Using Arabic Twitter Posts
In recent years, there has been increased interest in event detection using data posted to social media sites. Automatically transforming user-generated content into information relating to events is a challenging task due to the short informal language used within the content and the variety of topics discussed on social media. Recent advances in detecting real-world events in English and othe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2022
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2022.0130199